Probing the lexicon in evaluating commercial MT systems
ثبت نشده
چکیده
In the past the evaluation of machine translation systems has focused on single system evaluations because there were only few systems available. But now there are several commercial systems for the same language pair. This requires new methods of comparative evaluation. In the paper we propose a black-box method for comparing the lexical coverage of MT systems. The method is based on lists of words from different frequency classes. It is shown how these word lists can be compiled and used for testing. We also present the results of using our method on 6 MT systems that translate between English and German. 1 I n t r o d u c t i o n The evaluation of machine translation (MT) systems has been a central research topic in recent years (cp. (Sparck-Jones and Galliers, 1995; King, 1996)). Many suggestions have focussed on measuring the translation quality (e.g. error classification in (Flanagan, 1994) or post editing time in (Minnis, 1994)). These measures are time-consuming and difficult to apply. But translation quality rests on the linguistic competence of the MT system which again is based first and foremost on grammatical coverage and lexicon size. Testing grammatical coverage can be done by using a test suite (cp. (Nerbonne et al., 1993; Volk, 1995)). Here we will advocate a probing method for determining the lexical coverage of commercial MT systems. We have evaluated 6 MT systems which translate between English and German and which are all positioned in the low price market (under US$ 1500). • German Assistant in Accent Duo V2.0 (developer: MicroTac/Globalink; distributor: Accent) * Langenscheidts T1 Standard V3.0 (developer: GMS; distributor: Langenscheidt) • Personal Translator plus V2.0 (developer: IBM; distributor: von Rheinbaben & Busch) • Power Translator Professional (developer/distributor: Globalink) 1 • Systran Professional for Windows (developer: Systran S.A.; distributor: Mysoft) • Telegraph V1.0 (developer/distributor: Globalink) The overall goal of our evaluation was a comparison of these systems resulting in recommendations on which system to apply for which purpose. The evaluation consisted of compiling a list of criteria for self evaluation and three experiments with external volunteers, mostly students from a local interpreter school. These experiments were performed to judge the information content of the translations, the translation quality, and the user-friendliness. The list of criteria for self evaluation consisted of technical, linguistic and ergonomic issues. As part of the linguistic evaluation we wanted to determine the lexical coverage of the MT systems since only some of the systems provide figures on lexicon size in the documentation. Many MT system evaluations in the past have been white-box evaluations performed by a testing team in cooperation with the developers (see (Falkedal, 1991) for a survey). But commercial MT systems can only be evaluated in a black-box setup since the developer typically will not make the source code and even less likely the linguistic source data (lexicon and grammar) available. Most of the evaluations described in the literature have centered around one MT system. But there are 1Recently a newer version has been announced as "Power Translator Pro 6.2".
منابع مشابه
Probing the Lexicon in Evaluating Commercial MT Systems
In the past the evaluation of machine translation systems has focused on single system evaluations because there were only few systems available. But now there are several commercial systems for the same language pair. This requires new methods of comparative evaluation. In the paper we propose a black-box method for comparing the lexical coverage of MT systems. The method is based on lists of ...
متن کاملTypes of Semantic Information Necessary in a Machine Translation Lexicon
This paper describes research undertaken into assessing what types of semantic information (SI) are needed in a Machine Translation (MT) lexicon in order for ‘good’ translation quality to be attainable. We present a typology of semantic information, allowing the use of semantics in any MT system to be quantified in precise and absolute, rather than relative, terms. This typology was used to sur...
متن کاملInterlingua Developed and Utilized in Real Multilingual MT Product Systems
This paper describes characteristics of an interlingua we have developed. It contains a large lexicon and has been tested on actual MT systems in the translation of large volumes of actual documents. The main characteristics of the interlingua are as follows: (1) Conceptual primitives, elements of the interlingua, can be linked to any parts of speech in English or Japanese. (2) Positions of the...
متن کاملTowards a shared task for shallow semantics-based translation (in an industrial setting)
In the Lingenio analysis systems, the sentences are analyzed into syntactic slot grammar representations from which so called 'dependency trees' are derived which reduce the analyses to the semantically relevant nodes and decorate these by information from the semantic lexicon. Slot grammar is a unification-‐based dependency grammar (cf. McCord 89). It has been used in the Logic based Machine ...
متن کاملDesigning A Mixed System of Network DEA for Evaluating the Efficiency of Branches of Commercial Banks in Iran
One of the most important applications of data envelopment analysis tech-nique is measuring the efficiency of bank branches. Performance measure-ment in the banking industry is important for several groups, including bank managers, customers, investors, and shareholders. The purpose of this study is to examine and design a mixed structure to measure the efficiency of branches of Iranian banks a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002